LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval

نویسندگان

  • Pavel Moravec
  • Michal Kolovrat
  • Václav Snásel
چکیده

In the area of information retrieval, the dimension of document vectors plays an important role. Firstly, with higher dimensions index structures suffer the “curse of dimensionality” and their efficiency rapidly decreases. Secondly, we may not use exact words when looking for a document, thus we miss some relevant documents. LSI (Latent Semantic Indexing) is a numerical method, which discovers latent semantic in documents by creating concepts from existing terms. However, it is hard to compute LSI. In this article, we offer a replacement of LSI with a projection matrix created fromWordNet hierarchy and compare it with LSI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using BFA with wordnet ontology based model for web retrieval

In the area of information retrieval, the dimension of document vectors plays an important role. We may need to find a few words or concepts, which characterize the document based on its contents, to overcome the problem of the "curse of dimensionality", which makes indexing of highdimensional data problematic. To do so, we earlier proposed a Wordnet and Wordnet+LSI (Latent Semantic Indexing) b...

متن کامل

Explicit vs. Latent Concept Models for Cross-Language Information Retrieval

The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such ...

متن کامل

Lower Dimensional Representation of Text Data in Vector

Dimension reduction in today's vector space based information retrieval system is essential for improving computational eeciency in handling massive data. In this paper, we propose a mathematical framework for lower dimensional representation of text data in vector space based information retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent ...

متن کامل

Explicit Versus Latent Concept Models for Cross-Language Information Retrieval

The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such ...

متن کامل

Public Transport Ontology for Passenger Information Retrieval

Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004